Search CORE

20 research outputs found

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

Automatic detection of discourse structure for speech recognition and understanding.

Author: Bates Rebecca
Coccaro Noah
Jurafsky Daniel
Martin Rachel
Meteer Marie
Ries Klaus
Shriberg Elizabeth
Stolcke Andreas
Taylor Paul A
Van Ess-Dykema Carol
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 ‘Dialog Acts’ (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72% accuracy based on word transcripts. Using this detector to switch among the 42 Dialog- Act-Specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error

CiteSeerX

Edinburgh Research Archive

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

Author: Bates Rebecca
Coccaro Noah
Jurafsky Daniel
Martin Rachel
Meteer Marie
Ries Klaus
Shriberg Elizabeth
Stolcke Andreas
Taylor Paul
Van Ess-Dykema Carol
Publication venue: Kingston Press Services Ltd.
Publication date: 01/01/1998
Field of study

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study examines over 1000 conversations from the Switchboard corpus. DAs were handannotated, and prosodic features (duration, pause, F0, energy and speakingrate features) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. For an allway classification as well as three subtasks, prosody allowed highly significant classification over chance. Featurespecific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DAspecific statistical language model improved performance over that of the language model alone. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications

CiteSeerX

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

Generating Event Descriptions with SAGE: a Simulation . . .

Author: Marie Meteer
Publication venue
Publication date: 01/01/1994
Field of study

The SAGE system (Simulation and Generation Environment) was developed to address issues at the interface between conceptual modelling and natural language generation. In this paper, I describe SAGE and its components in the context of event descriptions. I show how kinds of information, such as the Reicbenbachian temporal points and event structure, which are usually treated as unified systems, are often best represented at multiple levels in the overall system. SAGE is composed of a knowledge representation language and simulator, which form the underlying model and constitute the "speaker"; a graphics component which displays the actions of the simulator and provides an anchor for locative and deictic relations; and the generator SPOKESMAN, which produces a textual narration of events

CiteSeerX

Crossref

Expressibility and the problem of efficient text planning

Author: Meteer Marie
Publication venue: Bloomsbury Publishing
Publication date: 01/01/2015
Field of study

CERN Document Server

Recommended from our members

The generation gap : The problem of expressibility in text planning

Author: Meteer Marie Wenzel
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/1990
Field of study

This thesis identifies and provides a solution for a particular problem in natural language generation: the problem of ensuring the expressibility of a text plan. Natural language generation is the process of going from a representation of a situation to a textual expression of some relevant portion of that situation in a natural language. Generation systems must have a principled way of ensuring that the message composed by the text planner is expressible in language, that is, that there are linguistic resources (words, syntactic structures) available for the linguistic component to realize the elements of the plan, and their composition is in accordance with the rules of composition in the language. I have addressed the problem of expressibility by designing a level of representation, the Text Structure, which is used by the text planner in composing the utterance. This intermediate level of representation bridges the generation gap between the representation of the world in the application program and the linguistic resources provided by the language. The terms and expressions in the Text Structure are abstractions over the concrete resources of language (the words, morphological markings, syntactic structures, etc. that actually appear in a stream of text). These abstract linguistic resources group together the expressible combinations of concrete linguistic resources. I have identified three kinds of information that are essential to an abstract linguistic representation: the constituency, the semantic category of the constituent (e.g. event, property), and the structural relations among the constituents (e.g. argument, adjunct). By providing the planner with a set of abstract resources, rather than letting it choose from the individual features that make them up, it is prevented from choosing a set of features that is not realizable. These abstractions can further constrain composition by defining what kinds of constituents can be extended and how semantic categories can compose. Text Structure is implemented in the Spokesman Generation System, which produces text for a variety of application programs. I describe in detail the structures in Spokesman\u27s text planner and walk through an example of the generation of a biographical paragraph from the Main Street simulation program

ScholarWorks@UMass Amherst